NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ProDy 2.0: increased scale and scope after 10 years of protein dynamics modelling with Python

https://doi.org/10.1093/bioinformatics/btab187

Zhang, She; Krieger, James M; Zhang, Yan; Kaya, Cihan; Kaynak, Burak; Mikulska-Ruminska, Karolina; Doruker, Pemra; Li, Hongchun; Bahar, Ivet (April 2021, Bioinformatics)
Cowen, Lenore (Ed.)
Abstract Summary ProDy, an integrated application programming interface developed for modelling and analysing protein dynamics, has significantly evolved in recent years in response to the growing data and needs of the computational biology community. We present major developments that led to ProDy 2.0: (i) improved interfacing with databases and parsing new file formats, (ii) SignDy for signature dynamics of protein families, (iii) CryoDy for collective dynamics of supramolecular systems using cryo-EM density maps and (iv) essential site scanning analysis for identifying sites essential to modulating global dynamics. Availability and implementation ProDy is open-source and freely available under MIT License from https://github.com/prody/ProDy. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
DCI: learning causal differences between gene regulatory networks

https://doi.org/10.1093/bioinformatics/btab167

Belyaeva, Anastasiya; Squires, Chandler; Uhler, Caroline (March 2021, Bioinformatics)
Cowen, Lenore (Ed.)
Abstract Summary Designing interventions to control gene regulation necessitates modeling a gene regulatory network by a causal graph. Currently, large-scale gene expression datasets from different conditions, cell types, disease states, and developmental time points are being collected. However, application of classical causal inference algorithms to infer gene regulatory networks based on such data is still challenging, requiring high sample sizes and computational resources. Here, we describe an algorithm that efficiently learns the differences in gene regulatory mechanisms between different conditions. Our difference causal inference (DCI) algorithm infers changes (i.e. edges that appeared, disappeared, or changed weight) between two causal graphs given gene expression data from the two conditions. This algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately. We provide a user-friendly Python implementation of DCI and also enable the user to learn the most robust difference causal graph across different tuning parameters via stability selection. Finally, we show how to apply DCI to single-cell RNA-seq data from different conditions and cell states, and we also validate our algorithm by predicting the effects of interventions. Availability and implementation Python package freely available at http://uhlerlab.github.io/causaldag/dci. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Prediction of histone post-translational modifications using deep learning

https://doi.org/10.1093/bioinformatics/btaa1075

Baisya, Dipankar Ranjan; Lonardi, Stefano (December 2020, Bioinformatics)
Cowen, Lenore (Ed.)
Abstract Motivation Histone post-translational modifications (PTMs) are involved in a variety of essential regulatory processes in the cell, including transcription control. Recent studies have shown that histone PTMs can be accurately predicted from the knowledge of transcription factor binding or DNase hypersensitivity data. Similarly, it has been shown that one can predict PTMs from the underlying DNA primary sequence. Results In this study, we introduce a deep learning architecture called DeepPTM for predicting histone PTMs from transcription factor binding data and the primary DNA sequence. Extensive experimental results show that our deep learning model outperforms the prediction accuracy of the model proposed in Benveniste et al. (PNAS 2014) and DeepHistone (BMC Genomics 2019). The competitive advantage of our framework lies in the synergistic use of deep learning combined with an effective pre-processing step. Our classification framework has also enabled the discovery that the knowledge of a small subset of transcription factors (which are histone-PTM and cell-type-specific) can provide almost the same prediction accuracy that can be obtained using all the transcription factors data. Availabilityand implementation https://github.com/dDipankar/DeepPTM. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters

https://doi.org/10.1093/bioinformatics/btaa630

Liu, Xiangyu; Li, Di; Liu, Juntao; Su, Zhengchang; Li, Guojun (July 2020, Bioinformatics)
Cowen, Lenore (Ed.)
Abstract Motivation Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets. Results We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets. Availability and implementation Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Irreducibility of Recombination Markov Chains in the Triangular Lattice

Cannon, Sarah (May 2023, SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23))
Berry, Jonathan; Shmoys, David; Cowen, Lenore; Naumann, Uwe (Ed.)
In the United States, regions (such as states or counties) are frequently divided into districts for the purpose of electing representatives. How the districts are drawn can have a profound effect on who's elected, and drawing the districts to give an advantage to a certain group is known as gerrymandering. It can be surprisingly difficult to detect when gerrymandering is occurring, but one algorithmic method is to compare a current districting plan to a large number of randomly sampled plans to see whether it is an outlier. Recombination Markov chains are often used to do this random sampling: randomly choose two districts, consider their union, and split this union up in a new way. This approach works well in practice and has been widely used, including in litigation, but the theory behind it remains underdeveloped. For example, it's not known if recombination Markov chains are irreducible, that is, if recombination moves suffice to move from any districting plan to any other. Irreducibility of recombination Markov chains can be formulated as a graph problem: for a planar graph G, is the space of all partitions of G into κ connected subgraphs (κ districts) connected by recombination moves? While the answer is yes when districts can be as small as one vertex, this is not realistic in real-world settings where districts must have approximately balanced populations. Here we fix district sizes to be κ1 ± 1 vertices, κ2 ± 1 vertices,… for fixed κ1, κ2,…, a more realistic setting. We prove for arbitrarily large triangular regions in the triangular lattice, when there are three simply connected districts, recombination Markov chains are irreducible. This is the first proof of irreducibility under tight district size constraints for recombination Markov chains beyond small or trivial examples. The triangular lattice is the most natural setting in which to first consider such a question, as graphs representing states/regions are frequently triangulated. The proof uses a sweep-line argument, and there is hope it will generalize to more districts, triangulations satisfying mild additional conditions, and other redistricting Markov chains.
more » « less
Full Text Available
Fast First-Order Methods for Monotone Strongly DR-Submodular Maximization

Sadeghi, Omid; Fazel, Maryam (January 2023, Proceedings of SIAM Conference on Applied and Computational Discrete Algorithms)
Berry, Jonathan; Shmoys, David; Cowen, Lenore; Naumann, Uwe (Ed.)
Continuous DR-submodular functions are a class of functions that satisfy the Diminishing Returns (DR) property, which implies that they are concave along non-negative directions. Existing works have studied monotone continuous DR-submodular maximization subject to a convex constraint and have proposed efficient algorithms with approximation guarantees. However, in many applications, e. g., computing the stability number of a graph and mean-field inference for probabilistic log-submodular models, the DR-submodular function has the additional property of being strongly concave along non-negative directions that could be utilized for obtaining faster convergence rates. In this paper, we first introduce and characterize the class of strongly DR-submodular functions and show how such a property implies strong concavity along non-negative directions. Then, we study L-smooth monotone strongly DR-submodular functions that have bounded curvature, and we show how to exploit such additional structure to obtain algorithms with improved approximation guarantees and faster convergence rates for the maximization problem. In particular, we propose the SDRFW algorithm that matches the provably optimal approximation ratio after only iterations, where c ∈ [0,1] and μ ≥ 0 are the curvature and the strong DR-submodularity parameter. Furthermore, we study the Projected Gradient Ascent (PGA) method for this problem and provide a refined analysis of the algorithm with an improved approximation ratio (compared to ½ in prior works) and a linear convergence rate. Given that both algorithms require knowledge of the smoothness parameter L, we provide a novel characterization of L for DR-submodular functions showing that in many cases, computing L could be formulated as a convex optimization problem, i. e., a geometric program, that could be solved efficiently. Experimental results illustrate and validate the efficiency and effectiveness of our algorithms.
more » « less
Full Text Available

Search for: All records